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This year marks the centenary of the birth of Kolmogorov. It is a pleasure for me to 
acknowledge this occasion by giving a lecture in connection with his life's work. My purpose 
herein is certainly not to present a whole historical study of Kolmogorov's output, but rather 
provide some remarks on specific mathematical topics in which he played an active role. As 
you know, Kolmogorov produced some eight hundred publications encompassing all the main 
fields of mathematics: functional analysis, ergodic theory, turbulence, probability theory and 
statistics, and logic. 





Andrei Nikolaievitch Kolmogorov 

He even delivred five seminal papers in the restricted domain of probability and stochastic 
processes foundations between 1931 to 1936, which make him one of the founders of the theory 
of continuous-time Markov processes or diffusions^. The subject I would like to discuss pertains 

^ These significant articles are the following: 

- Uber die analytischen Methoden in der Wahrschcinlichkeitsrecgnung, 1931, 

- Beitrage zur Masstheorie, 1933, 

- Zur Theorie der stetigen zufalligen Prozessen, 1933, 



to his famous " Grundbegriffe der Wahrscheinlichkeitsrechnung" , which partially lies beyond his 
main body of mathematical work, in some respects it serves as a manifesto for how to tackle 
probability and probabilistic problems within the field of mathematics. I will be providing 
some remarks on axiomatized languages that display the cases of both probability theory and 
of error calculus with Dirichlet forms. Based on these two examples, my aim is to emphasize 
the importance, in order for a language to be useful, of having an extension tool readily available. 

I. A brief history of random sequences theory 

In order to draw a comparison with Kolmogorov's axiomatic theory , it is helpful to explain 
what the "theory of random sequences" has become during the twentieth century. It did indeed 
serve an alternative way for incorporating probability into mathematics. Its purpose has been 
to describe a sequence of independent samples of a given quantity. In the simplest case, the 
theory pertains to samples of a random integer or even a single digit, so as to model the fair 
game of heads and tails, in the one-half / one-half perfectly symmetric case^. 

I.l The normal numbers of Borel (1909) 

It is now easy, and Emile Borel was already able to make the proof in 1909, that if we 
represent a real number over the unit interval [0, 1] by its binary expansion 

oo 
n=0 

to the independent one-half / one-half distribution of the digits corresponds the Lebesgue 
measure on the interval [0, 1]. 




• r.- ,- ^- 

Emile Borel 



- Grundbegriffe der Wahrscheinlichkeitsrechnung, 1993, 

- Zur Theorie der Markoffschen Ketten, 1936. 

^This section is inspired by the very interesting study conducted by Claude Dellacherie entitled "Nombres 
au hasard de Borel a Martin Lof Gazette des Mathematiciens n^ll, 1978 



As a consequence, for almost every real number x e [0, 1], the asymptotic frequency of any 
finite sequence is | to the power of the sequence length. A real number fulfilling this property 
is said to be normal in the sense of Borel. 

Now, proving that almost all real numbers are normal is just one step, another would be 
to exhibit such a number ! For the number tt determining whether it is normal or not consti- 
tute a famous unsolved conjecture. Borel actually forwarded an effective, albeit sophisticated, 
construction of a normal number. 

In 1933 however, Champernowne showed that the sequence obtained by writing the integer 
successively in dyadic representation is normal in the sense of Borel: 

1 10 11 100 101 110 111 1000 1001 1010 1011 1100 1101 1110 nil 10000 ... 

This clearly displays that the concept of a normal number does not capture the idea of random 
sequence very well. 

Already back in 1919, Von Mises had proposed an improvement toward the definition a 
random sequence, by means of a new concept of "collective"^ which sought to describe a typical 
game of heads and tails. The idea is to ask for more than asymptotic averages and to think 
of a player gambling only at some random times depending on the evolution of the game : a 
sequence of digits is a "collective" if it satisfies the law of large numbers and if any subsequence 
obtained by a non-anticipative selection rule satisfies also the law of large numbers. This 
interesting approach, which portends the notion of "stopping time" , does nevertheless have the 
disadvantage of being difficult to apply in practical terms. A. Wald, one of the founders of 
statistics and decision theory, proposed in 1937 the more precise notion of "collective relatively 
to a family of rules" . 




R. Von Mises A. Wald A. Church 



Yet it would take the famous logician A. Church in 1940, with the first contribution from the 
field of logic into the debate, to propose an "absolute notion of collective" that uses the set 
of all effective non-anticipative rules as regards recursive functions theory. It thus appeared 
that the goal has been achieved by applying this new theory of effectiveness stemming from the 
recent works of the logicians in the 1930's (Godel, Turing, Church). 

Over this same period however, just prior to the Second World War, unsuspected new 
difficulties arose concerning the notion of "collective" . In his work Etude critique de la notion 
de collectif (1939), Jean Ville demonstrated that random sequences possess some probabilistic 
properties that a "collective" may not always fulfill. A "collective" does not generally feature 
the right magnitude of fluctuations. In his argument Jean Ville uses the modern concept of 
mathematical martingale whose properties would be improved by J. L. Doob in particular 

^ "Grundlagen der Wahrscheinlichkeitsrechnung" Math. Zeitung 5, 52-99, 1919. 



during the 1950's. By transfering the term martingale from gambhng to mathematics Ville 
added a spark to this notion and hkely contributed to its subsequent importance. 

We would have to wait until the 1960's to obtain a satisfactory answer to the question of 
random sequence. This answer came from mathematical logic and is owed to Martin Lof"*. 
Roughly speaking, a random sequence successfully passes all effective statistical randomness 
tests. For a real number in [0, 1], being random in the sense of Martin Lof signifies that it does 
not belong to any effective Lebesgue negligible set in [0, 1]. Such a number cannot be given by 
an algorithm, it is random in the sense of Church yet avoids Ville's critiques. 

Although QTiitc fascinating, the theory of random sequences remained useless for proba- 
bilists. The outstanding twentieth century development of probability theory, which began as 
a subsidiary field and became one of the primary domains of applied and even pure mathemat- 
ics, is based on another approach : the construction of a language for handling probabilistic 
calculations. 

We would like to examine the reason behind this language's fruitfulness. 

II. Axiomatization of Kolmogorov and cr-additivity 

The paper entitled Grundbegriffe der Wahrscheinlichkeitsrechnung is an appeal to include 
probabilistic calculus into measure theory. Kolmogorov docs not presume this idea is new, 
instead, he cites several authors who have already applied Lebesgue measure theory for prob- 
abilistic investigations, in particular Borel, Frechet, Steinhaus, Levy. He did proposes however 
new arguments, which proved to be highly valuable for subsequent research : the construc- 
tion of probabilities on infinite dimensional spaces and the definition of conditional laws and 
conditional expectations using the Radon-Nikodym theorem. 




M. Frechet H. Steinhaus P. Levy 



He did not consider axiomatization as a pure formal system, but rather as a language that 
makes sense and that allows conducting thought and reasonning. In remarking that "every 
axiomatic theory admits, as is well known, an unlimited number of concrete interpretations"^, 
he emphasizes the intuitive interpretation of his axiomatization. He went on to display a dic- 
tionary between random events and sets: 



"^''The definition of a random sequence" Information and control 9, 602-619, (1966). 

^We are in 1933 here and the works of Lowenheim and Skolem (1915-1920) are aheady known, which prove 
the existence of a countable model for any consistent theory. 
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6. Event A is impossible 
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7. Event A must occur 
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Disjoint decomposition of E 


8. Possible results Ai, A2, . . . , An 
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of an experiment 


9. 


S is a subset of A 


9. From the occurence of event B 




follows the inevitable occurence of A 



For the axioms, the five first ones are elementary: Let T a set of subsets of a set E. 

1. J-' is a field of sets 

2. T contains the set E 

3. To each set A in T is assigned a non negative real number P{A), called the probability 
of event A 

4. P{E) equals 1 

5. If A and B have no elements in common, then P{A + B) — P{A) + P{A) 

Kolmogorov underscores the importance of the sixth axiom : "In all future investigations we 
shall assume that besides axioms 1 through 5, another axiom holds true as well : 

6. For a decreasing sequence of events 

AiD A2D ■■■ D AnD ■■■ 
in T for which HnAn — the following relation holds lim„ P{An) — 0". 

This axiom of cr-additivity implies the probability P to be a measure in the sense of Lebesgue 
and Borel, which then embeds probability theory into measure theory : 

probability < — ^ measure 

event < — > measurable set 

random variable < — > measurable function 

expectation < — > integral 

independence < — > product of measurable spaces 

conditional expectation < — > Radon-Nikodym derivative 

Let's remark that as late as 1938, the philosopher Karl Popper, whose main education 
stemmed from the field of psychology, was not convinced of the interest in placing probability 
theory within the framework of measure theory. Even in 1955, he still seemed proud to em- 
phasize that a theory with only the first five axioms is more general. He wrote "Kolmogorov's 
system can be taken, however, as one of the interpretation of mine"^. 

^K. Popper, The logic of Scientific Discovery, Hutchinson, 1972, p319. 



Karl Popper 



We know clearly now, thanks to the development of stochastic analysis over the twentieth 
century, that cr-additivity is the key tool making this language expansive. It allows defining the 
probability of events or the expectation of functions that are not given by simple closed formulae, 
but rather by limits. This fact is of absolutely prime importance since several mathematical 
objects are defined by limits and the methods for defining these converging sequences of objects 
are not a priori restricted. 

This paves the way to the study of stochastic processes : if we know the probabilistic 
properties of a finite number of coordinates on a product space, without the cr-additivity 
we cannot conclude anything about functions depending upon an infinite number of X^s. 

Thanks to cx-additivity, connections with functional analysis may be developed, thereby 
giving rise to probabilistic interpretations. For example, potential theory is connected with 
Markov processes theory and martingales theory. Let's recall that J. L. Doob proved his exten- 
sion of Fatou's lemma at the boundary from conical limits to non-tangential limits, first using 
a probabilistic argument and then, one year later, by means of a purely analytical approach. 

III. Error calculus with Dirichlet forms 

I would now like to present a more recent theory, in some repect a "cousin" to probability 
theory, which also possesses a means of extension providing it with remarkable power and fruit- 
fulness. I have in mind the theory of Dirichlet forms with its interpretation in terms of errors. 
I shall begin with the ideas of Gauss about errors which are the elementary bases of the theory. 

III.l. Gauss formulae for the propagation of errors 

The ideas of Gauss were forwarded at the beginning of the XlXth century, at a time when 
several mathematicians were concerned with measurements errors, especially in the field of 
celestial mechanics. First of all, Legendre {Nouvelles methodes pour la dtermination des orbites 
des planetes, 1805) proposed the least squares principle to choose the best value of a quantity 
obtained by several different measures. 



F. Gauss in 1803 



Legendre 



Laplace 



Secondly, Gauss himself ( Theoria motus coelestium, 1809) elaborated the famous argument 
proving (with some implicit hypotheses) that once it has been assumed the arithmetic average 
is the best value to retain from among several results of quantity measurements, then, the 
probability law of the error is necessarily the normal law. This argument has been made 
more rigorous by Poincare at the end of the century. Thirdly, Laplace (Theoric analytique 
des probabilites, 1811) demonstrated how the least squares method is usefull for solving linear 
systems when the number of equations does not agree with the number of unknowns. 




F. Gauss in 1828 H. Poincare 



Within this same context, a few years later. Gauss became interested in the propagation of 
errors through calculations (Theoria combinationis, 1821) and stated the following problem : 

Given a quantity U = F{Vi, V2, V3, . . .) function of the erroneous quantities Vi, V2, V3, . . 
compute the potential quadratic error to expect on U with the quadratic errors cr^, o"!, cxg, . . . on 
Vi,V2,V3, . . . being known and assumed small and independent. 

His answer is the following formula : 




He also provides the covariance between an error on U and an error on another function of the 

Formula (1) displays a property which makes it much to be preferred in several respects to 
other formulae encountered in textbooks throughout the XlXth and XXth centuries. It features 



a coherence property. With a formula such as 

^dF, ,dF^ ^dF, 
oVi 0V2 c/ V3 

errors may depend on the way in which the function F is written. Aheady in dimension 2, we 
can note that if the indentity map were written as the composition of an injective hnear map 
with its inverse, errors would be increased, which is hardly acceptable. 

This difficulty does not arise in Gauss' calculus. Introducing the differential operator 

1 2 92 ^ ^ Q2 

and supposing the functions to be smooth, we remark that formula (1) can be written as 

= L{F'^) - 2FLF 

and coherence follows from the transport of a differential operator by an application. If u and 
V are regular injective mappings, then, in denoting the operator if — > L{(pou) o by we 
obtain O^ouL — 9^{6uL). 

The errors on Vl, ^2, V3, . . . are not necessarily supposed to be independent nor constant and 
may depend on Vi, V2, V3, . . . Considering a field of positive symmetric matrices aij{vi,V2, . . .)) 
on IR" representing the conditional variances and covariances of errors on V^i, 1^2, 1^, . . . given 
the values t;2, I's, . . . of T^, T^, V's, ■ ■ then the error on ?7 = F(yi, V2,V^, ■ ■ ■) given the values 
vi,V2, vz,... of Vi, V2, V^3, ■ ■ ■ is 

'^U = ^ -^2, -^3, • • ■)^{Vi,V2, V^, . . .)(Tij{Vi,V2, ^3, . . .) 

ij ^ ^ 

which depends solely on F as mapping. This is the general form of the error calculus a la Gauss. 



III. 2 Error propagation through calculations : the error calculus based on Dirichlet forms 

The error calculus of Gauss contains the limitation of supposing that both the function 
F and the random variables Vi,V2,V3,... are explicitely known. In probabilistic modelling 
however, we are often confronted by a situation in which all the random variables, functions 
and covariances matrices are given by limits. For such situations, a means of extension thereby 
becomes essential. 

Let the quantities be defined on the probability space P). The quadratic error on 

a random variable X is itself random, let us denote it r[X]. Intuitively speaking we still 
assume that the errors are infinitely small, even though this assumption does not appear in 
the notation. It is as though an infinitely small unit were available for measuring errors fixed 
throughout the entire problem. The extension tool lies in the following : we assume that if 
Xn X m L'^{Q, A, IP) and if the error r[Xm — X„] on X„i — X„ can be made as small as we 
wish in L^{fl, A, P) for m, n large enough, then the error r[X„ — X] on X^ — X goes to zero 
in L\ 

This idea can be interpreted as a reinforced coherence principle, it means that the error 
on X is attached to X and furthermore, if the sequence of pairs error on X„) converges 
suitably, it converges necessarily to a pair {X, error on X). 

The axiomatization of these idea involves the notion of closed quadratic differential form or 
Dirichlet form : 



An error structure is a term 

where {fl, A, P) is a probability space, satisfying the following properties 

1) JD is a dense subvector space of L^{fl,A,JP) 

2) T is a positive symmetric bilinear map from E) x D into ij^(P) fulfilling the functional 
calculus of class fl Lip, which means that if u E ID"*, v G ID", for F and G of class and 
Lipschitz from JR"^ [resp. P"/ into P, one has F o u E D, G o v eJD and 

r[Fou,Gov]^^Flou G[ o V T[ui, Vj] P — a.s. 

3) the bilinear form S[f,g\ — IE[r[/, g^]] is closed, i.e. ID is complete under the norm 

IMlD = (||.||i. + 4])^/^ 

(then the form £^ is a Dirichlet form.) 

The main benefit of the extension tool is that error theory based on Dirichlet forms extends 
to the infinite dimension, which allows for error calculus on stochastic processes (especially on 
Brownian motion but also on the Poisson space), provides several new results on stochastic dif- 
ferential equations, and gives applications to fluctuations in physics and to sensitivity analysis 
in finance^. 

IV. Languages with extension tools and Richard's paradox 

In comparing Kolmogorov's axiomatic theory of probability with the random sequences 
theory, we have emphasized for the former 

- the presence of a language (syntax and semantics) 

- a powerful extension tool yielding, in some sense, risky results. 

This may be placed in analogy with the language of Analysis that handles real numbers. 
We know, indeed, the existence of 2^° real numbers, although only will ever be indicated 
with precision. This is the situation highlighted by Richard's paradox (1905). 




Jules Antoine Richard (1862-1956) 

''See the books of Malliavin, Fukushima, Ikeda-Watanabe, Bismut, Bichteler-Gravereau-Jacod, Watanabe, 
Strook, Bouleau-Hirsch, Ma-R6ckner, Nualart, 0ksendal & al., Ustunel-Zakai, etc. and the papers of several 
hundred of researchers. 

Regarding the interpretation in terms of error propagation, see N. Bouleau, Error Calculus for Finance and 
Physics, the Language of Dirichlet Forms, De Gruyter, 235p, 2003. 



The paradox can be stated as follows : 

Let's write all of the pairs using the 28 characters ( the 26 letters, the space and the comma 
to separate words) in alphabetic order; then the triples, and so forth, all finite sequences. Every 
definition of a real number will appear in the list. 

Let's cross out all the sequences which are not definitions of real numbers. 

Let Ui be the real number defined by the first remaining definition; 

U2 the one defined by the following definition; 

U3 the one defined by the third one; 

and so forth. 

We thus obtain all the real numbers defined by finitely many words, written in a particular 
order. The number a given by the definition "the number without entire part, each decimal of 
which immediately follows the decimal of same rank of the number of same rank in the sequence 
(un), the zero being considered as following the numeral nine" should be in the list, but cannot 
be equal to any number Un. 

Mathematical logic is capable, of course, of overcoming the apparent contradiction in this 
paradox. Nevertheless, a true phenomenon has indeed been highlighted : there are 2^" real 
numbers, we dont know how large this cardinal 2^° actually is, and only real numbers will 
ever be precisely defined. 

In such a situation, for mathematical Analysis, we have chosen a language with an extension 
tool : the Cauchy criterion. This strategy allows handling real numbers defined by limits 
regardless of the construction of the used convergent sequence. This tool has then been carried 
from the real case to the functional case by the notions of Hilbert space and Banach space 
which are certainly ones of the most powerful concepts of XXth century Analysis. 



